Measuring Similarity between XML Documents
نویسنده
چکیده
With the advance of World Wide Web standards, XML documents become popular in e-business applications for information exchange. Electronic catalogs and transaction records are now formatted in XML. XML documents are semi-structured documents with XML schemas marking up the semantics. XML separates presentation from semantics so that presentation of information on different devices can be processed independently from information management. However, information retrieval techniques for XML documents are still in the beginning stage. Most ebusiness applications only use XML for data interchange or presentation purposes. Advance information retrieval techniques have not been applied in retrieving, organizing, and managing XML documents. In this paper, we propose a similarity measurement between XML documents based on the Jaccard’s similarity function for unstructured text documents. Examples will be provided to illustrate the formulation. Given the proposed similarity measurement, traditional information retrieval techniques for unstructured test documents can also be applied for XML documents. The performance of XML document management will be significantly improved.
منابع مشابه
خوشهبندی فراابتکاری اسناد فارسی اِکساِماِل مبتنی بر شباهت ساختاری و محتوایی
Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...
متن کاملA Novel Approach to Measuring Structural Similarity between XML Documents
Measuring structural similarity between XML documents has become a key component in various applications, including XML mining, schema matching, and web service discovery, among others. This paper presents a novel structural similarity measure incorporating kernel methods into XML documents. Results on preliminary simulations show that this approach outperforms conventional ones.
متن کاملStructural Similarity Evaluation Between XML Documents and DTDs
The automatic processing and management of XML-based data are ever more popular research issues due to the increasing abundant use of XML, especially on the Web. Nonetheless, several operations based on the structure of XML data have not yet received strong attention. Among these is the process of matching XML documents and XML grammars, useful in various applications such as documents classifi...
متن کاملA New Sequential Mining Approach to XML Document Similarity Computation
1 Manuscript submitted to Postgraduate Research Day 2 Corresponding author Abstract Measuring the structural similarity among XML documents is the task of finding their semantic correspondence and is fundamental to many web-based applications. While there exist several methods to address the problem, the data mining approach seems to be a novel, interesting and promising one. It works on the id...
متن کاملPrototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica
Background and Aim: The current research aims at prototyping query-by-humming music information retrieval systems for smart phones. Methods: This multi-method research follows simulation technique from mixed models of the operations research methodology, and the documentary research method, simultaneously. Two chromatic harmonica albums comprised the research population. To achieve the purpose ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004